Experiment overview

Purpose: Briefly outline the purpose of your experiment.

Hypothesis: Outline the hypotheses if applicable.

Predictions: - Prediction 1 (if applicable).

  • How it relates to hypothesis: describe how this prediction relates to the hypothesis

Data: Briefly describe the type of data collected.


Data

To get digital distances: The TPS coordinates were in a long format, and I thought it would be easiest to locate the values needed to insert into the euclidean distance formula with a wide format. So, I first cleaned up the original TPS file–I removed all curve data, and unnecessary rows (ie. LM=16, image=) to be left with a column of IDs, a column of landmark labels, and two columns for the coordinates.

Importing that into R, I then used the wider fxn to create one column for IDs, 16 columns for the X coordinates of all the landmarks, and 16 columns for all the y coordinates of all the landmarks (originally wanted it to be a coordinates column, so there would be two rows for each specimen to minimize column numbers, but whatever, this works). I separated this into two dataframes, one with just the X coords and one with just the Y, that way it would be easier to call.

I calculated the euclidean distance (formula: sqrt(sq(X\(LM2-X\)LM1)+sq(Y\(LM2-Y\)LM1))) for most of the same measurements (only missing total length, and head width, as this was not possible in the digital photos). For head depth, wasn’t sure if 2-> 11 or 2->12 was closer to what I measured by hand; same for body depth, wasn’t sure if it was 3->10 or 3->11. I included both.

finished all of the calculations and added the values into a dataframe with the IDs from the original tps file. Exported it to excel

Once in excel, I realized that the values I had were in pixels (I think?) rather than mm, so I needed to find a way to convert in order to compare to the hand measurements. I found out that the scale factor (included in the original tps file) is mm/pixel, so I added this to the excel, and multiplied it to each value to get the mm. However, it was a bit off (eg 4.5 rather than 45) so I multiplied it by 10 to get to a number that resembled the hand measurements. Not 100% sure if this was correct, or why I would need to multiply by 10, but going with it for now.

library(tidyverse) library(curl)

raw <- curl(“https://raw.githubusercontent.com/allisondavis/morphology_analysis/refs/heads/master/dig-dist.csv”)

raw <- read.csv(raw, header = TRUE, sep = “,”, stringsAsFactors = TRUE)

raw1 <- raw %>% pivot_wider(names_from = LABELS, values_from = c(X.COR, Y.COR))

x <- raw1[ , 1:17] y <- raw1[, c(1, 18:33)]

HL.dig <- sqrt(((x\(X.COR_LM2 -x\)X.COR_LM1)^2 + (y\(Y.COR_LM2-y\)Y.COR_LM1)^2)) SL.dig <- sqrt(((x\(X.COR_LM6 -x\)X.COR_LM1)^2 + (y\(Y.COR_LM6-y\)Y.COR_LM1)^2)) PreDL.dig <- sqrt(((x\(X.COR_LM3 -x\)X.COR_LM2)^2 + (y\(Y.COR_LM3-y\)Y.COR_LM2)^2)) DbL.dig <- sqrt(((x\(X.COR_LM4 -x\)X.COR_LM3)^2 + (y\(Y.COR_LM4-y\)Y.COR_LM3)^2)) CPD.dig <- sqrt(((x\(X.COR_LM7 -x\)X.COR_LM5)^2 + (y\(Y.COR_LM7-y\)Y.COR_LM5)^2)) HD1.dig <- sqrt(((x\(X.COR_LM11 -x\)X.COR_LM2)^2 + (y\(Y.COR_LM11-y\)Y.COR_LM2)^2)) HD2.dig <- sqrt(((x\(X.COR_LM12 -x\)X.COR_LM2)^2 + (y\(Y.COR_LM12-y\)Y.COR_LM2)^2)) CPL.dig <- sqrt(((x\(X.COR_LM8 -x\)X.COR_LM6)^2 + (y\(Y.COR_LM8-y\)Y.COR_LM6)^2)) BD1.dig <- sqrt(((x\(X.COR_LM3 -x\)X.COR_LM10)^2 + (y\(Y.COR_LM3-y\)Y.COR_LM10)^2)) BD2.dig <- sqrt(((x\(X.COR_LM3 -x\)X.COR_LM11)^2 + (y\(Y.COR_LM3-y\)Y.COR_LM11)^2)) SnL.dig <- sqrt(((x\(X.COR_LM15 -x\)X.COR_LM1)^2 + (y\(Y.COR_LM15-y\)Y.COR_LM1)^2)) OL.dig <- sqrt(((x\(X.COR_LM16 -x\)X.COR_LM15)^2 + (y\(Y.COR_LM16-y\)Y.COR_LM15)^2))

raw2 <- cbind(raw1[,1], SL.dig, BD1.dig, BD2.dig, CPD.dig, CPL.dig, PreDL.dig, DbL.dig, HL.dig, HD1.dig, HD2.dig, SnL.dig, OL.dig)

library(writexl)

write_xlsx(raw2, ‘C:/Users/allis/OneDrive/Desktop/comparison_data.xlsx’)

Performed most analyses in MorphoJ. Analyses here will strictly be for comparing digital vs hand measurements.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(curl)
## Using libcurl 8.3.0 with Schannel
## 
## Attaching package: 'curl'
## 
## The following object is masked from 'package:readr':
## 
##     parse_date
raw <- curl("https://raw.githubusercontent.com/allisondavis/morphology_analysis/refs/heads/master/comparison_data.csv")

raw <- read.csv(raw, header = TRUE, sep = ",", stringsAsFactors = TRUE)

#AD20-084 is missing all manual values, so I will remove to avoid issues with NAs
raw<- raw[raw$ID !="AD20-082", ]

Normality check

I will use Shapiro-wilk, histograms, and QQ plots to determine what traits are normal.

Will separate my raw dataset into the two factors of interest: digital and manual measurements.

library(dplyr)

digital <- raw %>%
  select(ID, SPP, ends_with(".dig"))

manual <- raw %>%
  select(ID, SPP, ends_with(".man"))

Digital

All but OL fail SW test with a slight skew/deviation to the right.

##### Shapiro-Wilk test #####
shapiro.test(digital$SL)
## 
##  Shapiro-Wilk normality test
## 
## data:  digital$SL
## W = 0.97473, p-value = 1.165e-05
shapiro.test(digital$BD1)
## 
##  Shapiro-Wilk normality test
## 
## data:  digital$BD1
## W = 0.98196, p-value = 0.0002967
shapiro.test(digital$BD2)
## 
##  Shapiro-Wilk normality test
## 
## data:  digital$BD2
## W = 0.9907, p-value = 0.03071
shapiro.test(digital$CPD)
## 
##  Shapiro-Wilk normality test
## 
## data:  digital$CPD
## W = 0.96472, p-value = 2.6e-07
shapiro.test(digital$CPL)
## 
##  Shapiro-Wilk normality test
## 
## data:  digital$CPL
## W = 0.95444, p-value = 9.409e-09
shapiro.test(digital$PreDL)
## 
##  Shapiro-Wilk normality test
## 
## data:  digital$PreDL
## W = 0.98017, p-value = 0.0001268
shapiro.test(digital$DbL)
## 
##  Shapiro-Wilk normality test
## 
## data:  digital$DbL
## W = 0.9864, p-value = 0.002823
shapiro.test(digital$HL)
## 
##  Shapiro-Wilk normality test
## 
## data:  digital$HL
## W = 0.95243, p-value = 5.184e-09
shapiro.test(digital$HD1)
## 
##  Shapiro-Wilk normality test
## 
## data:  digital$HD1
## W = 0.9765, p-value = 2.463e-05
shapiro.test(digital$HD2)
## 
##  Shapiro-Wilk normality test
## 
## data:  digital$HD2
## W = 0.97148, p-value = 3.143e-06
shapiro.test(digital$SnL)
## 
##  Shapiro-Wilk normality test
## 
## data:  digital$SnL
## W = 0.97219, p-value = 4.145e-06
shapiro.test(digital$OL)
## 
##  Shapiro-Wilk normality test
## 
## data:  digital$OL
## W = 0.99609, p-value = 0.5705
##### Histograms #####

par(mfcol=c(2,2), oma = c(0,0,2,0))

hist(digital$SL, breaks=30)

hist(digital$BD1, breaks=30)

hist(digital$BD2, breaks=30)

hist(digital$CPD, breaks=30)

hist(digital$CPL, breaks=30)

hist(digital$PreDL, breaks=30)

hist(digital$DbL, breaks=30)

hist(digital$HL, breaks=30)

hist(digital$HD1, breaks=30)

hist(digital$HD2, breaks=30)

hist(digital$SnL, breaks=30)

hist(digital$OL, breaks=30)

##### QQ plots #####

par(mfcol=c(2,2), oma = c(0,0,2,0))

qqnorm(digital$SL)
qqline(digital$SL)

qqnorm(digital$BD1)
qqline(digital$BD1)

qqnorm(digital$BD2)
qqline(digital$BD2)

qqnorm(digital$CPD)
qqline(digital$CPD)

qqnorm(digital$CPL)
qqline(digital$CPL)

qqnorm(digital$PreDL)
qqline(digital$PreDL)

qqnorm(digital$DbL)
qqline(digital$DbL)

qqnorm(digital$HL)
qqline(digital$HL)

qqnorm(digital$HD1)
qqline(digital$HD1)

qqnorm(digital$HD2)
qqline(digital$HD2)

qqnorm(digital$SnL)
qqline(digital$SnL)

qqnorm(digital$OL)
qqline(digital$OL)

Manual

All fail the SW test, with a slight skew/deviation to the right.

##### Shapiro-Wilk test #####
shapiro.test(manual$SL)
## 
##  Shapiro-Wilk normality test
## 
## data:  manual$SL
## W = 0.98036, p-value = 0.0001385
shapiro.test(manual$BD1)
## 
##  Shapiro-Wilk normality test
## 
## data:  manual$BD1
## W = 0.96494, p-value = 2.806e-07
shapiro.test(manual$CPD)
## 
##  Shapiro-Wilk normality test
## 
## data:  manual$CPD
## W = 0.97037, p-value = 2.046e-06
shapiro.test(manual$CPL)
## 
##  Shapiro-Wilk normality test
## 
## data:  manual$CPL
## W = 0.97427, p-value = 9.619e-06
shapiro.test(manual$PreDL)
## 
##  Shapiro-Wilk normality test
## 
## data:  manual$PreDL
## W = 0.98037, p-value = 0.0001391
shapiro.test(manual$DbL)
## 
##  Shapiro-Wilk normality test
## 
## data:  manual$DbL
## W = 0.98347, p-value = 0.00062
shapiro.test(manual$HL)
## 
##  Shapiro-Wilk normality test
## 
## data:  manual$HL
## W = 0.97032, p-value = 2.009e-06
shapiro.test(manual$HD1)
## 
##  Shapiro-Wilk normality test
## 
## data:  manual$HD1
## W = 0.98378, p-value = 0.0007249
shapiro.test(manual$SnL)
## 
##  Shapiro-Wilk normality test
## 
## data:  manual$SnL
## W = 0.97615, p-value = 2.113e-05
shapiro.test(manual$OL)
## 
##  Shapiro-Wilk normality test
## 
## data:  manual$OL
## W = 0.98828, p-value = 0.007818
##### Histograms #####

par(mfcol=c(2,2), oma = c(0,0,2,0))

hist(manual$SL, breaks=30)

hist(manual$BD1, breaks=30)

hist(manual$CPD, breaks=30)

hist(manual$CPL, breaks=30)

hist(manual$PreDL, breaks=30)

hist(manual$DbL, breaks=30)

hist(manual$HL, breaks=30)

hist(manual$HD1, breaks=30)

hist(manual$SnL, breaks=30)

hist(manual$OL, breaks=30)


##### QQ plots #####

par(mfcol=c(2,2), oma = c(0,0,2,0))

qqnorm(manual$SL)
qqline(manual$SL)

qqnorm(manual$BD1)
qqline(manual$BD1)

qqnorm(manual$CPD)
qqline(manual$CPD)

qqnorm(manual$CPL)
qqline(manual$CPL)

qqnorm(manual$PreDL)
qqline(manual$PreDL)

qqnorm(manual$DbL)
qqline(manual$DbL)

qqnorm(manual$HL)
qqline(manual$HL)

qqnorm(manual$HD1)
qqline(manual$HD1)

qqnorm(manual$SnL)
qqline(manual$SnL)

qqnorm(manual$OL)
qqline(manual$OL)

Log transform & recheck

While some values are still significant, log transformations vastly improve normality (eg p=2e-16 to p=0.002). The histograms and QQ plots look pretty damn normal, so I will stick with log transformed values.

Log transformed data sets

dig_trans <- digital
dig_trans[, c(3:14)] <- log(dig_trans[, c(3:14)])

man_trans <- manual
man_trans[, c(3:12)] <- log(man_trans[, c(3:12)])

Digital

OL was the only one that was normal in raw check, but since it was not normal in manual, and I want to compare dig to man, I will transform the OL here too (don’t want to compare raw to transformed measurements).

##### Shapiro-Wilk test #####
shapiro.test(dig_trans$SL)
## 
##  Shapiro-Wilk normality test
## 
## data:  dig_trans$SL
## W = 0.99403, p-value = 0.2055
shapiro.test(dig_trans$BD1)
## 
##  Shapiro-Wilk normality test
## 
## data:  dig_trans$BD1
## W = 0.9938, p-value = 0.1809
shapiro.test(dig_trans$BD2)
## 
##  Shapiro-Wilk normality test
## 
## data:  dig_trans$BD2
## W = 0.98978, p-value = 0.0181
shapiro.test(dig_trans$CPD)
## 
##  Shapiro-Wilk normality test
## 
## data:  dig_trans$CPD
## W = 0.99187, p-value = 0.06017
shapiro.test(dig_trans$CPL)
## 
##  Shapiro-Wilk normality test
## 
## data:  dig_trans$CPL
## W = 0.99116, p-value = 0.03989
shapiro.test(dig_trans$PreDL)
## 
##  Shapiro-Wilk normality test
## 
## data:  dig_trans$PreDL
## W = 0.99151, p-value = 0.04877
shapiro.test(dig_trans$DbL)
## 
##  Shapiro-Wilk normality test
## 
## data:  dig_trans$DbL
## W = 0.98425, p-value = 0.0009222
shapiro.test(dig_trans$HL)
## 
##  Shapiro-Wilk normality test
## 
## data:  dig_trans$HL
## W = 0.99129, p-value = 0.04319
shapiro.test(dig_trans$HD1)
## 
##  Shapiro-Wilk normality test
## 
## data:  dig_trans$HD1
## W = 0.99687, p-value = 0.758
shapiro.test(dig_trans$HD2)
## 
##  Shapiro-Wilk normality test
## 
## data:  dig_trans$HD2
## W = 0.99479, p-value = 0.3068
shapiro.test(dig_trans$SnL)
## 
##  Shapiro-Wilk normality test
## 
## data:  dig_trans$SnL
## W = 0.99732, p-value = 0.8576
shapiro.test(dig_trans$OL)
## 
##  Shapiro-Wilk normality test
## 
## data:  dig_trans$OL
## W = 0.98989, p-value = 0.01936
##### Histograms #####

par(mfcol=c(2,2), oma = c(0,0,2,0))

hist(dig_trans$SL, breaks=30)

hist(dig_trans$BD1, breaks=30)

hist(dig_trans$BD2, breaks=30)

hist(dig_trans$CPD, breaks=30)

hist(dig_trans$CPL, breaks=30)

hist(dig_trans$PreDL, breaks=30)

hist(dig_trans$DbL, breaks=30)

hist(dig_trans$HL, breaks=30)

hist(dig_trans$HD1, breaks=30)

hist(dig_trans$HD2, breaks=30)

hist(dig_trans$SnL, breaks=30)

hist(dig_trans$OL, breaks=30)

##### QQ plots #####

par(mfcol=c(2,2), oma = c(0,0,2,0))

qqnorm(dig_trans$SL)
qqline(dig_trans$SL)

qqnorm(dig_trans$BD1)
qqline(dig_trans$BD1)

qqnorm(dig_trans$BD2)
qqline(dig_trans$BD2)

qqnorm(dig_trans$CPD)
qqline(dig_trans$CPD)

qqnorm(dig_trans$CPL)
qqline(dig_trans$CPL)

qqnorm(dig_trans$PreDL)
qqline(dig_trans$PreDL)

qqnorm(dig_trans$DbL)
qqline(dig_trans$DbL)

qqnorm(dig_trans$HL)
qqline(dig_trans$HL)

qqnorm(dig_trans$HD1)
qqline(dig_trans$HD1)

qqnorm(dig_trans$HD2)
qqline(dig_trans$HD2)

qqnorm(dig_trans$SnL)
qqline(dig_trans$SnL)

qqnorm(dig_trans$OL)
qqline(dig_trans$OL)

Manual

##### Shapiro-Wilk test #####
shapiro.test(man_trans$SL)
## 
##  Shapiro-Wilk normality test
## 
## data:  man_trans$SL
## W = 0.99424, p-value = 0.2297
shapiro.test(man_trans$BD1)
## 
##  Shapiro-Wilk normality test
## 
## data:  man_trans$BD1
## W = 0.99567, p-value = 0.4747
shapiro.test(man_trans$CPD)
## 
##  Shapiro-Wilk normality test
## 
## data:  man_trans$CPD
## W = 0.99598, p-value = 0.5428
shapiro.test(man_trans$CPL)
## 
##  Shapiro-Wilk normality test
## 
## data:  man_trans$CPL
## W = 0.98966, p-value = 0.01693
shapiro.test(man_trans$PreDL)
## 
##  Shapiro-Wilk normality test
## 
## data:  man_trans$PreDL
## W = 0.98692, p-value = 0.003722
shapiro.test(man_trans$DbL)
## 
##  Shapiro-Wilk normality test
## 
## data:  man_trans$DbL
## W = 0.98849, p-value = 0.008812
shapiro.test(man_trans$HL)
## 
##  Shapiro-Wilk normality test
## 
## data:  man_trans$HL
## W = 0.99081, p-value = 0.03262
shapiro.test(man_trans$HD1)
## 
##  Shapiro-Wilk normality test
## 
## data:  man_trans$HD1
## W = 0.99264, p-value = 0.09385
shapiro.test(man_trans$SnL)
## 
##  Shapiro-Wilk normality test
## 
## data:  man_trans$SnL
## W = 0.99033, p-value = 0.02488
shapiro.test(man_trans$OL)
## 
##  Shapiro-Wilk normality test
## 
## data:  man_trans$OL
## W = 0.97981, p-value = 0.0001072
##### Histograms #####

par(mfcol=c(2,2), oma = c(0,0,2,0))

hist(man_trans$SL, breaks=30)

hist(man_trans$BD1, breaks=30)

hist(man_trans$CPD, breaks=30)

hist(man_trans$CPL, breaks=30)

hist(man_trans$PreDL, breaks=30)

hist(man_trans$DbL, breaks=30)

hist(man_trans$HL, breaks=30)

hist(man_trans$HD1, breaks=30)

hist(man_trans$SnL, breaks=30)

hist(man_trans$OL, breaks=30)


##### QQ plots #####

par(mfcol=c(2,2), oma = c(0,0,2,0))

qqnorm(man_trans$SL)
qqline(man_trans$SL)

qqnorm(man_trans$BD1)
qqline(man_trans$BD1)

qqnorm(man_trans$CPD)
qqline(man_trans$CPD)

qqnorm(man_trans$CPL)
qqline(man_trans$CPL)

qqnorm(man_trans$PreDL)
qqline(man_trans$PreDL)

qqnorm(man_trans$DbL)
qqline(man_trans$DbL)

qqnorm(man_trans$HL)
qqline(man_trans$HL)

qqnorm(man_trans$HD1)
qqline(man_trans$HD1)

qqnorm(man_trans$SnL)
qqline(man_trans$SnL)

qqnorm(man_trans$OL)
qqline(man_trans$OL)

I will perform 3 different analyses: ANOVA, Bland-Altman and PCA. These vary in whether they need long or wide formats. Also, the raw data contains columns that are not needed for these analyses (eg location or photo date) so I will remove those to leave just the species, tag ID, and measurements.

Long format

library(tidyr)

#log transform entire data set

raw2 <- raw
raw2[, c(3:24)] <- log(raw2[, c(3:24)])

# this will create a data frame with species, ID, characteristic, method and value
df_long <- raw2 %>%
  pivot_longer(
    cols = ends_with(".dig") | ends_with(".man"),
    names_to = c("characteristic", "method"),
    names_sep = "\\.",
    values_to = "value"
  )

# method and characteristic are initially a character; need to be a factor
df_long$method <- as.factor(df_long$method)
df_long$characteristic <- as.factor(df_long$characteristic)

# one fish (AD20-082) is missing hand measurements; will remove

df_long <- df_long[df_long$ID !="AD20-082", ]


#create data frames that only have one measurement for BD and HD

## BD1 and HD1 for both digital and manual measurements 
df_long2 <- df_long[-grep("BD2", df_long$characteristic),]
df_long2 <- df_long2[-grep("HD2", df_long2$characteristic),]#only contains BD1 & HD1

## BD1/HD1 for manual (since they only have one) and BD2/HD2 for digital
df_long3 <- df_long %>%
    filter(
    # Keep everything for Hand measurements
    method == "man" |
    # Keep all digital measurements except BD1 and HD1
    !(method == "dig" & characteristic %in% c("BD1", "HD1"))
  )

#to ensure comparisons between BD2 and BD1 in this new data frame, will combine characteristic names to just BD or HD

df_long3<- df_long3 %>%
  mutate(
    characteristic = case_when(
      characteristic %in% c("BD1", "BD2") ~ "BD",
      characteristic %in% c("HD1", "HD2") ~ "HD",
      TRUE ~ characteristic  # Leave other characteristics unchanged
    )
  )

Wide format

df_wide <- df_long2 %>%
  pivot_wider(
    names_from = method,
    values_from = value
  )


df_wide2 <- df_long3 %>%
  pivot_wider(
    names_from = method,
    values_from = value
  )

ANOVA

Results: no significant difference between hand or digital measurements when using BD2/HD2; suggestive difference in method when using BD1/HD1 (0.0584).

No effect of species (did’t expect there to be, but just in case).

# NOTE: analysis requires long format

#### for BD1 and HD1
anova_results <- aov(value~method * SPP, data = df_long2)
summary(anova_results)
##               Df Sum Sq Mean Sq F value Pr(>F)  
## method         1      2  2.0122   3.584 0.0584 .
## SPP            2      2  0.9491   1.691 0.1845  
## method:SPP     2      1  0.3224   0.574 0.5631  
## Residuals   6774   3803  0.5614                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova_results2 <- aov(value~method + SPP, data = df_long2)
summary(anova_results2)
##               Df Sum Sq Mean Sq F value Pr(>F)  
## method         1      2  2.0122   3.585 0.0584 .
## SPP            2      2  0.9491   1.691 0.1845  
## Residuals   6776   3804  0.5613                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova_results3 <- aov(value~method, data = df_long2)
summary(anova_results3)
##               Df Sum Sq Mean Sq F value Pr(>F)  
## method         1      2  2.0122   3.584 0.0584 .
## Residuals   6778   3805  0.5614                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#### for BD2 and HD2 in digital, BD1 and HD1 in hand
anova_results4 <- aov(value~method * SPP, data = df_long3)
summary(anova_results4)
##               Df Sum Sq Mean Sq F value Pr(>F)
## method         1      1  0.5344   0.931  0.335
## SPP            2      2  0.8128   1.416  0.243
## method:SPP     2      1  0.5398   0.941  0.390
## Residuals   6774   3888  0.5739
anova_results5 <- aov(value~method + SPP, data = df_long3)
summary(anova_results5)
##               Df Sum Sq Mean Sq F value Pr(>F)
## method         1      1  0.5344   0.931  0.335
## SPP            2      2  0.8128   1.416  0.243
## Residuals   6776   3889  0.5739
anova_results6 <- aov(value~method, data = df_long3)
summary(anova_results6)
##               Df Sum Sq Mean Sq F value Pr(>F)
## method         1      1  0.5344   0.931  0.335
## Residuals   6778   3890  0.5740

Bland-Altman Plot

Used to assess agreement between two measurement methods.

Plots

BD/HD1

This analysis will only use BD1 and HD1 for both digital and manual measurements.

# requires wide data frame

library(ggplot2)

#compute averages and differences for Bland-Altman

df_ba <- df_wide %>%
  mutate(
    average = (dig + man) / 2,
    difference = dig - man
  )


#### Plot for whole dataset

ba_full <- ggplot(df_ba, aes(x = average, y = difference)) +
  geom_point(alpha = 0.6) +
  geom_hline(yintercept = 0, linetype = "solid", color = "black") +
  geom_hline(yintercept = mean(df_ba$difference, na.rm = TRUE), color = "red", linetype = "dashed") +  # Mean line
  geom_hline(yintercept = mean(df_ba$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(df_ba$difference, na.rm = TRUE),
             color = "blue", linetype = "dashed") +  # Limits of agreement
  labs(title = "Bland-Altman Plots for All Characteristics",
       x = "Average of Digital and Hand Measurements",
       y = "Difference (Digital - Hand)") +
  theme_minimal()

ba_full

The Bland-Altman plot for the whole dataset shows disticnt clusters. This is most likely caused from the differences in range for the different characteristic types. We will look at species and characteristics to see which influences the clustering.

##### Species

ba_full_SPP <- ggplot(df_ba, aes(x = average, y = difference, color=SPP)) +
  geom_point(alpha = 0.6) +
  geom_hline(yintercept = 0, linetype = "solid", color = "black") +
  geom_hline(yintercept = mean(df_ba$difference, na.rm = TRUE), color = "red", linetype = "dashed") +  # Mean line
  geom_hline(yintercept = mean(df_ba$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(df_ba$difference, na.rm = TRUE),
             color = "blue", linetype = "dashed") +  # Limits of agreement
  labs(title = "Bland-Altman Plots for All Characteristics",
       x = "Average of Digital and Hand Measurements",
       y = "Difference (Digital - Hand)") +
  theme_minimal()

ba_full_SPP

##### Characteristics

ba_full_CR <- ggplot(df_ba, aes(x = average, y = difference, color=characteristic)) +
  geom_point(alpha = 0.6) +
  geom_hline(yintercept = 0, linetype = "solid", color = "black") +
  geom_hline(yintercept = mean(df_ba$difference, na.rm = TRUE), color = "red", linetype = "dashed") +  # Mean line
  geom_hline(yintercept = mean(df_ba$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(df_ba$difference, na.rm = TRUE),
             color = "blue", linetype = "dashed") +  # Limits of agreement
  labs(title = "Bland-Altman Plots for All Characteristics",
       x = "Average of Digital and Hand Measurements",
       y = "Difference (Digital - Hand)") +
  theme_minimal()

ba_full_CR

We indeed see that characteristics drive the clustering. We also see that the data for Pre-dorsal length (PreDL) hover at and below the lower limit of agreement. This would suggest either a measurement bias in one/both of the measurement methods, or a biological difference, where certain species, sizes, or specific characteristics might be harder to measure accurately with one method compared to the other.

We’ll take a look at Bland-Altman plots for each characteristic now.

ba_by_char <- ggplot(df_ba, aes(x = average, y = difference)) +
  geom_point(alpha = 0.6) +
  geom_hline(yintercept = 0, linetype = "solid", color = "black") +
  geom_hline(yintercept = mean(df_ba$difference, na.rm = TRUE), color = "red", linetype = "dashed") +  # Mean line
  geom_hline(yintercept = mean(df_ba$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(df_ba$difference, na.rm = TRUE),
             color = "blue", linetype = "dashed") +  # Limits of agreement
  labs(title = "Bland-Altman Plots for All Characteristics",
       x = "Average of Digital and Hand Measurements",
       y = "Difference (Digital - Hand)") +
  facet_wrap(~ characteristic, scales = "free") +
  theme_minimal()

ba_by_char

### Check if there are species differences

ba_CR_SPP <- ggplot(df_ba, aes(x = average, y = difference, color=SPP)) +
  geom_point(alpha = 0.6) +
  geom_hline(yintercept = 0, linetype = "solid", color = "black") +
  geom_hline(yintercept = mean(df_ba$difference, na.rm = TRUE), color = "red", linetype = "dashed") +  # Mean line
  geom_hline(yintercept = mean(df_ba$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(df_ba$difference, na.rm = TRUE),
             color = "blue", linetype = "dashed") +  # Limits of agreement
  labs(title = "Bland-Altman Plots for All Characteristics",
       x = "Average of Digital and Hand Measurements",
       y = "Difference (Digital - Hand)") +
  facet_wrap(~ characteristic, scales = "free") +
  theme_minimal()

ba_CR_SPP

As before, we see that PreDL is at or below the lower limit of agreement. When colored by species, we don’t see any distinct clustering causing this positioning. Likewise, there are no distinct species clustering in any of the characteristics, with only a handful of outliers in each characteristic. This may be due to some of the individual fish being MUCH larger than the others.

BD/HD2

This analysis will use BD2 and HD2 for digital, but BD1 and HD1 for manual measurements.

# requires wide data frame

#compute averages and differences for Bland-Altman

df_ba2 <- df_wide2 %>%
  mutate(
    average = (dig + man) / 2,
    difference = dig - man
  )

#### Plot for whole dataset

ba_full2 <- ggplot(df_ba2, aes(x = average, y = difference)) +
  geom_point(alpha = 0.6) +
  geom_hline(yintercept = 0, linetype = "solid", color = "black") +
  geom_hline(yintercept = mean(df_ba2$difference, na.rm = TRUE), color = "red", linetype = "dashed") +  # Mean line
  geom_hline(yintercept = mean(df_ba2$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(df_ba2$difference, na.rm = TRUE),
             color = "blue", linetype = "dashed") +  # Limits of agreement
  labs(title = "Bland-Altman Plots for All Characteristics",
       x = "Average of Digital and Hand Measurements",
       y = "Difference (Digital - Hand)") +
  theme_minimal()

ba_full2

The Bland-Altman plot for the whole dataset shows disticnt clusters. This is most likely caused from the differences in range for the different characteristic types. We will look at species and characteristics to see which influences the clustering.

##### Species

ba_full_SPP2 <- ggplot(df_ba2, aes(x = average, y = difference, color=SPP)) +
  geom_point(alpha = 0.6) +
  geom_hline(yintercept = 0, linetype = "solid", color = "black") +
  geom_hline(yintercept = mean(df_ba2$difference, na.rm = TRUE), color = "red", linetype = "dashed") +  # Mean line
  geom_hline(yintercept = mean(df_ba2$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(df_ba2$difference, na.rm = TRUE),
             color = "blue", linetype = "dashed") +  # Limits of agreement
  labs(title = "Bland-Altman Plots for All Characteristics",
       x = "Average of Digital and Hand Measurements",
       y = "Difference (Digital - Hand)") +
  theme_minimal()

ba_full_SPP2

##### Characteristics

ba_full_CR2 <- ggplot(df_ba2, aes(x = average, y = difference, color=characteristic)) +
  geom_point(alpha = 0.6) +
  geom_hline(yintercept = 0, linetype = "solid", color = "black") +
  geom_hline(yintercept = mean(df_ba2$difference, na.rm = TRUE), color = "red", linetype = "dashed") +  # Mean line
  geom_hline(yintercept = mean(df_ba2$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(df_ba2$difference, na.rm = TRUE),
             color = "blue", linetype = "dashed") +  # Limits of agreement
  labs(title = "Bland-Altman Plots for All Characteristics",
       x = "Average of Digital and Hand Measurements",
       y = "Difference (Digital - Hand)") +
  theme_minimal()

ba_full_CR2

We indeed see that characteristics drive the clustering. We also see that the data for Pre-dorsal length (PreDL) hover at and below the lower limit of agreement. This would suggest either a measurement bias in one/both of the measurement methods, or a biological difference, where certain species, sizes, or specific characteristics might be harder to measure accurately with one method compared to the other.

We’ll take a look at Bland-Altman plots for each characteristic now.

ba_by_char2 <- ggplot(df_ba2, aes(x = average, y = difference)) +
  geom_point(alpha = 0.6) +
  geom_hline(yintercept = 0, linetype = "solid", color = "black") +
  geom_hline(yintercept = mean(df_ba2$difference, na.rm = TRUE), color = "red", linetype = "dashed") +  # Mean line
  geom_hline(yintercept = mean(df_ba2$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(df_ba2$difference, na.rm = TRUE),
             color = "blue", linetype = "dashed") +  # Limits of agreement
  labs(title = "Bland-Altman Plots for All Characteristics",
       x = "Average of Digital and Hand Measurements",
       y = "Difference (Digital - Hand)") +
  facet_wrap(~ characteristic, scales = "free") +
  theme_minimal()

ba_by_char2

### Check if there are species differences

ba_CR_SPP2 <- ggplot(df_ba2, aes(x = average, y = difference, color=SPP)) +
  geom_point(alpha = 0.6) +
  geom_hline(yintercept = 0, linetype = "solid", color = "black") +
  geom_hline(yintercept = mean(df_ba2$difference, na.rm = TRUE), color = "red", linetype = "dashed") +  # Mean line
  geom_hline(yintercept = mean(df_ba2$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(df_ba2$difference, na.rm = TRUE),
             color = "blue", linetype = "dashed") +  # Limits of agreement
  labs(title = "Bland-Altman Plots for All Characteristics",
       x = "Average of Digital and Hand Measurements",
       y = "Difference (Digital - Hand)") +
  facet_wrap(~ characteristic, scales = "free") +
  theme_minimal()

ba_CR_SPP2

We’re really only interested in how BD1, HD1, BD2, and HD2 compare, so let’s isolate those.

data_BD1 <- df_wide %>%
  filter(characteristic == "BD1")
data_BD1 <- data_BD1 %>%
  mutate(
    average = (dig + man) / 2,
    difference = dig - man
  )

data_HD1 <- df_wide %>%
  filter(characteristic == "HD1")
data_HD1 <- data_HD1 %>%
  mutate(
    average = (dig + man) / 2,
    difference = dig - man
  )


data_BD2 <- df_wide2 %>%
  filter(characteristic == "BD")
data_BD2 <- data_BD2 %>%
  mutate(
    average = (dig + man) / 2,
    difference = dig - man
  )


data_HD2 <- df_wide2 %>%
  filter(characteristic == "HD")
data_HD2 <- data_HD2 %>%
  mutate(
    average = (dig + man) / 2,
    difference = dig - man
  )


ba_BD1 <- ggplot(data_BD1, aes(x = average, y = difference, color=SPP)) +
  geom_point(alpha = 0.6) +
  geom_hline(yintercept = 0, linetype = "solid", color = "black") +
  geom_hline(yintercept = mean(data_BD1$difference, na.rm = TRUE), color = "red", linetype = "dashed") +  # Mean line
  geom_hline(yintercept = mean(data_BD1$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(data_BD1$difference, na.rm = TRUE),
             color = "blue", linetype = "dashed") +  # Limits of agreement
  labs(title = "BD1",
       x = "Average of Digital and Hand Measurements",
       y = "Difference (Digital - Hand)") +
  theme_minimal()


ba_HD1 <- ggplot(data_HD1, aes(x = average, y = difference, color=SPP)) +
  geom_point(alpha = 0.6) +
  geom_hline(yintercept = 0, linetype = "solid", color = "black") +
  geom_hline(yintercept = mean(data_HD1$difference, na.rm = TRUE), color = "red", linetype = "dashed") +  # Mean line
  geom_hline(yintercept = mean(data_HD1$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(data_HD1$difference, na.rm = TRUE),
             color = "blue", linetype = "dashed") +  # Limits of agreement
  labs(title = "HD1",
       x = "Average of Digital and Hand Measurements",
       y = "Difference (Digital - Hand)") +
  theme_minimal()

ba_BD2 <- ggplot(data_BD2, aes(x = average, y = difference, color=SPP)) +
  geom_point(alpha = 0.6) +
  geom_hline(yintercept = 0, linetype = "solid", color = "black") +
  geom_hline(yintercept = mean(data_BD2$difference, na.rm = TRUE), color = "red", linetype = "dashed") +  # Mean line
  geom_hline(yintercept = mean(data_BD2$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(data_BD2$difference, na.rm = TRUE),
             color = "blue", linetype = "dashed") +  # Limits of agreement
  labs(title = "BD2",
       x = "Average of Digital and Hand Measurements",
       y = "Difference (Digital - Hand)") +
  theme_minimal()


ba_HD2 <- ggplot(data_HD2, aes(x = average, y = difference, color=SPP)) +
  geom_point(alpha = 0.6) +
  geom_hline(yintercept = 0, linetype = "solid", color = "black") +
  geom_hline(yintercept = mean(data_HD2$difference, na.rm = TRUE), color = "red", linetype = "dashed") +  # Mean line
  geom_hline(yintercept = mean(data_HD2$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(data_HD2$difference, na.rm = TRUE),
             color = "blue", linetype = "dashed") +  # Limits of agreement
  labs(title = "HD2",
       x = "Average of Digital and Hand Measurements",
       y = "Difference (Digital - Hand)") +
  theme_minimal()

library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
grid.arrange(ba_BD1, ba_BD2, ba_HD1, ba_HD2)

We see that using BD2 and HD2 increase scatter and increase points beyond the LOAs. This suggests that BD2 and HD2 are not as comprable as BD1 and HD1.

Stats

All together

NOTE: this looping below isn’t working because the structure of ‘stats’ does not contain the p-value (even though printing stats does show a p-value). Contacted package author for more assistance. unique_characteristics <- unique(df_long$characteristic)

Function to apply blandr.statistics for each characteristic and extract the relevant stats

statistics_list <- lapply(unique_characteristics, function(char) {

# Filter data for the current characteristic (two rows, one for each method) df_char <- df_long %>% filter(characteristic == char)

# Apply blandr.statistics function to the Digital and Hand values stats <- blandr.statistics( x = df_char\(value[df_char\)method == “dig”], y = df_char\(value[df_char\)method == “man”] )

# Extract desired statistics from the results p_value <- stats\(p_value mean_diff <- stats\)bias lower_limit <- stats\(lowerLOA upper_limit <- stats\)upperLOA

# Return a data frame with the statistics for this characteristic return(data.frame( Characteristic = char, p_value = p_value, mean_difference = mean_diff, mean = mean, lower_limit = lower_limit, upper_limit = upper_limit )) })

Individual

Literally all (but like HD2 and OL) are significant. HOWEVER, the bias is INCREDIBLY close to zero (suggesting no difference between the methods) and the LOAs include 0 (which also suggest that no difference between the methods is possible).

  • BD2 LOA does not contain 0 (0.1-0.6)
  • PreDL LOA does not contain 0 (-0.7 to -0.2)
library(blandr)
## Warning: package 'blandr' was built under R version 4.4.2
### Whole dataset

(bar_full <- blandr.statistics(df_wide$dig, df_wide$man))
## Bland-Altman Statistics
## =======================
## t = -9.7766, df = 3389, p-value = < 2.2e-16
## alternative hypothesis: true bias is not equal to 0
## 
## =======================
## Number of comparisons:  3390 
## Maximum value for average measures:  4.042436 
## Minimum value for average measures:  0.3834636 
## Maximum value for difference in measures:  0.5562302 
## Minimum value for difference in measures:  -0.9342704 
## 
## Bias:  -0.03445528 
## Standard deviation of bias:  0.2051953 
## 
## Standard error of bias:  0.003524258 
## Standard error for limits of agreement:  0.00602359 
## 
## Bias:  -0.03445528 
## Bias- upper 95% CI:  -0.02754539 
## Bias- lower 95% CI:  -0.04136516 
## 
## Upper limit of agreement:  0.3677276 
## Upper LOA- upper 95% CI:  0.3795378 
## Upper LOA- lower 95% CI:  0.3559173 
## 
## Lower limit of agreement:  -0.4366381 
## Lower LOA- upper 95% CI:  -0.4248279 
## Lower LOA- lower 95% CI:  -0.4484484 
## 
## =======================
## Derived measures:  
## Mean of differences/means:  -2.604907 
## Point estimate of bias as proportion of lowest average:  -8.985281 
## Point estimate of bias as proportion of highest average -0.8523395 
## Spread of data between lower and upper LoAs:  0.8043657 
## Bias as proportion of LoA spread:  -4.283534 
## 
## =======================
## Bias: 
##  -0.03445528  ( -0.04136516  to  -0.02754539 ) 
## ULoA: 
##  0.3677276  ( 0.3559173  to  0.3795378 ) 
## LLoA: 
##  -0.4366381  ( -0.4484484  to  -0.4248279 )
(bar_full2 <- blandr.statistics(df_wide2$dig, df_wide2$man))
## Bland-Altman Statistics
## =======================
## t = -4.3957, df = 3389, p-value = 1.138e-05
## alternative hypothesis: true bias is not equal to 0
## 
## =======================
## Number of comparisons:  3390 
## Maximum value for average measures:  4.042436 
## Minimum value for average measures:  0.3834636 
## Maximum value for difference in measures:  0.746832 
## Minimum value for difference in measures:  -0.9342704 
## 
## Bias:  -0.01775692 
## Standard deviation of bias:  0.2352018 
## 
## Standard error of bias:  0.004039623 
## Standard error for limits of agreement:  0.006904443 
## 
## Bias:  -0.01775692 
## Bias- upper 95% CI:  -0.009836577 
## Bias- lower 95% CI:  -0.02567727 
## 
## Upper limit of agreement:  0.4432387 
## Upper LOA- upper 95% CI:  0.456776 
## Upper LOA- lower 95% CI:  0.4297014 
## 
## Lower limit of agreement:  -0.4787525 
## Lower LOA- upper 95% CI:  -0.4652152 
## Lower LOA- lower 95% CI:  -0.4922898 
## 
## =======================
## Derived measures:  
## Mean of differences/means:  -2.042943 
## Point estimate of bias as proportion of lowest average:  -4.630667 
## Point estimate of bias as proportion of highest average -0.4392629 
## Spread of data between lower and upper LoAs:  0.9219912 
## Bias as proportion of LoA spread:  -1.925932 
## 
## =======================
## Bias: 
##  -0.01775692  ( -0.02567727  to  -0.009836577 ) 
## ULoA: 
##  0.4432387  ( 0.4297014  to  0.456776 ) 
## LLoA: 
##  -0.4787525  ( -0.4922898  to  -0.4652152 )
### By characteristic

df_SL <- df_wide %>%
  filter(characteristic == "SL")
(bar_SL <- blandr.statistics(df_SL$dig, df_SL$man))
## Bland-Altman Statistics
## =======================
## t = 23.922, df = 338, p-value = < 2.2e-16
## alternative hypothesis: true bias is not equal to 0
## 
## =======================
## Number of comparisons:  339 
## Maximum value for average measures:  4.042436 
## Minimum value for average measures:  2.960844 
## Maximum value for difference in measures:  0.1785351 
## Minimum value for difference in measures:  -0.2129669 
## 
## Bias:  0.05387797 
## Standard deviation of bias:  0.04146884 
## 
## Standard error of bias:  0.002252278 
## Standard error for limits of agreement:  0.003852918 
## 
## Bias:  0.05387797 
## Bias- upper 95% CI:  0.05830822 
## Bias- lower 95% CI:  0.04944772 
## 
## Upper limit of agreement:  0.1351569 
## Upper LOA- upper 95% CI:  0.1427356 
## Upper LOA- lower 95% CI:  0.1275782 
## 
## Lower limit of agreement:  -0.02740096 
## Lower LOA- upper 95% CI:  -0.01982224 
## Lower LOA- lower 95% CI:  -0.03497967 
## 
## =======================
## Derived measures:  
## Mean of differences/means:  1.517899 
## Point estimate of bias as proportion of lowest average:  1.819683 
## Point estimate of bias as proportion of highest average 1.332809 
## Spread of data between lower and upper LoAs:  0.1625579 
## Bias as proportion of LoA spread:  33.14387 
## 
## =======================
## Bias: 
##  0.05387797  ( 0.04944772  to  0.05830822 ) 
## ULoA: 
##  0.1351569  ( 0.1275782  to  0.1427356 ) 
## LLoA: 
##  -0.02740096  ( -0.03497967  to  -0.01982224 )
df_BD1 <- df_wide %>%
  filter(characteristic == "BD1")
(bar_BD1 <- blandr.statistics(df_BD1$dig, df_BD1$man))
## Bland-Altman Statistics
## =======================
## t = 30.615, df = 338, p-value = < 2.2e-16
## alternative hypothesis: true bias is not equal to 0
## 
## =======================
## Number of comparisons:  339 
## Maximum value for average measures:  3.121966 
## Minimum value for average measures:  1.748914 
## Maximum value for difference in measures:  0.4951842 
## Minimum value for difference in measures:  -0.2418473 
## 
## Bias:  0.1207813 
## Standard deviation of bias:  0.07263929 
## 
## Standard error of bias:  0.003945225 
## Standard error for limits of agreement:  0.006749001 
## 
## Bias:  0.1207813 
## Bias- upper 95% CI:  0.1285416 
## Bias- lower 95% CI:  0.113021 
## 
## Upper limit of agreement:  0.2631543 
## Upper LOA- upper 95% CI:  0.2764297 
## Upper LOA- lower 95% CI:  0.249879 
## 
## Lower limit of agreement:  -0.02159169 
## Lower LOA- upper 95% CI:  -0.00831636 
## Lower LOA- lower 95% CI:  -0.03486703 
## 
## =======================
## Derived measures:  
## Mean of differences/means:  4.995667 
## Point estimate of bias as proportion of lowest average:  6.906076 
## Point estimate of bias as proportion of highest average 3.868758 
## Spread of data between lower and upper LoAs:  0.284746 
## Bias as proportion of LoA spread:  42.41721 
## 
## =======================
## Bias: 
##  0.1207813  ( 0.113021  to  0.1285416 ) 
## ULoA: 
##  0.2631543  ( 0.249879  to  0.2764297 ) 
## LLoA: 
##  -0.02159169  ( -0.03486703  to  -0.00831636 )
df_BD2 <- df_wide2 %>%
  filter(characteristic == "BD")
(bar_BD2 <- blandr.statistics(df_BD2$dig, df_BD2$man))
## Bland-Altman Statistics
## =======================
## t = 52.87, df = 338, p-value = < 2.2e-16
## alternative hypothesis: true bias is not equal to 0
## 
## =======================
## Number of comparisons:  339 
## Maximum value for average measures:  3.163316 
## Minimum value for average measures:  1.876811 
## Maximum value for difference in measures:  0.746832 
## Minimum value for difference in measures:  -0.09538079 
## 
## Bias:  0.3557936 
## Standard deviation of bias:  0.1239048 
## 
## Standard error of bias:  0.006729585 
## Standard error for limits of agreement:  0.01151214 
## 
## Bias:  0.3557936 
## Bias- upper 95% CI:  0.3690308 
## Bias- lower 95% CI:  0.3425565 
## 
## Upper limit of agreement:  0.598647 
## Upper LOA- upper 95% CI:  0.6212915 
## Upper LOA- lower 95% CI:  0.5760026 
## 
## Lower limit of agreement:  0.1129402 
## Lower LOA- upper 95% CI:  0.1355847 
## Lower LOA- lower 95% CI:  0.09029573 
## 
## =======================
## Derived measures:  
## Mean of differences/means:  14.04186 
## Point estimate of bias as proportion of lowest average:  18.95735 
## Point estimate of bias as proportion of highest average 11.24749 
## Spread of data between lower and upper LoAs:  0.4857068 
## Bias as proportion of LoA spread:  73.25275 
## 
## =======================
## Bias: 
##  0.3557936  ( 0.3425565  to  0.3690308 ) 
## ULoA: 
##  0.598647  ( 0.5760026  to  0.6212915 ) 
## LLoA: 
##  0.1129402  ( 0.09029573  to  0.1355847 )
df_CPD <- df_wide %>%
  filter(characteristic == "CPD")
(bar_CPD <- blandr.statistics(df_CPD$dig, df_CPD$man))
## Bland-Altman Statistics
## =======================
## t = 27.399, df = 338, p-value = < 2.2e-16
## alternative hypothesis: true bias is not equal to 0
## 
## =======================
## Number of comparisons:  339 
## Maximum value for average measures:  2.471218 
## Minimum value for average measures:  1.170403 
## Maximum value for difference in measures:  0.5562302 
## Minimum value for difference in measures:  -0.1550393 
## 
## Bias:  0.101579 
## Standard deviation of bias:  0.06825963 
## 
## Standard error of bias:  0.003707354 
## Standard error for limits of agreement:  0.006342082 
## 
## Bias:  0.101579 
## Bias- upper 95% CI:  0.1088714 
## Bias- lower 95% CI:  0.09428661 
## 
## Upper limit of agreement:  0.2353679 
## Upper LOA- upper 95% CI:  0.2478428 
## Upper LOA- lower 95% CI:  0.222893 
## 
## Lower limit of agreement:  -0.03220987 
## Lower LOA- upper 95% CI:  -0.01973495 
## Lower LOA- lower 95% CI:  -0.04468479 
## 
## =======================
## Derived measures:  
## Mean of differences/means:  5.715623 
## Point estimate of bias as proportion of lowest average:  8.678974 
## Point estimate of bias as proportion of highest average 4.110483 
## Spread of data between lower and upper LoAs:  0.2675778 
## Bias as proportion of LoA spread:  37.96243 
## 
## =======================
## Bias: 
##  0.101579  ( 0.09428661  to  0.1088714 ) 
## ULoA: 
##  0.2353679  ( 0.222893  to  0.2478428 ) 
## LLoA: 
##  -0.03220987  ( -0.04468479  to  -0.01973495 )
df_CPL <- df_wide %>%
  filter(characteristic == "CPL")
(bar_CPL <- blandr.statistics(df_CPL$dig, df_CPL$man))
## Bland-Altman Statistics
## =======================
## t = 30.741, df = 338, p-value = < 2.2e-16
## alternative hypothesis: true bias is not equal to 0
## 
## =======================
## Number of comparisons:  339 
## Maximum value for average measures:  2.890711 
## Minimum value for average measures:  1.76216 
## Maximum value for difference in measures:  0.3986366 
## Minimum value for difference in measures:  -0.19231 
## 
## Bias:  0.1409026 
## Standard deviation of bias:  0.08439069 
## 
## Standard error of bias:  0.004583473 
## Standard error for limits of agreement:  0.007840837 
## 
## Bias:  0.1409026 
## Bias- upper 95% CI:  0.1499183 
## Bias- lower 95% CI:  0.1318869 
## 
## Upper limit of agreement:  0.3063084 
## Upper LOA- upper 95% CI:  0.3217314 
## Upper LOA- lower 95% CI:  0.2908854 
## 
## Lower limit of agreement:  -0.02450314 
## Lower LOA- upper 95% CI:  -0.00908016 
## Lower LOA- lower 95% CI:  -0.03992613 
## 
## =======================
## Derived measures:  
## Mean of differences/means:  5.963766 
## Point estimate of bias as proportion of lowest average:  7.996017 
## Point estimate of bias as proportion of highest average 4.874324 
## Spread of data between lower and upper LoAs:  0.3308115 
## Bias as proportion of LoA spread:  42.59302 
## 
## =======================
## Bias: 
##  0.1409026  ( 0.1318869  to  0.1499183 ) 
## ULoA: 
##  0.3063084  ( 0.2908854  to  0.3217314 ) 
## LLoA: 
##  -0.02450314  ( -0.03992613  to  -0.00908016 )
df_PreDL <- df_wide %>%
  filter(characteristic == "PreDL")
(bar_PreDL <- blandr.statistics(df_PreDL$dig, df_PreDL$man))
## Bland-Altman Statistics
## =======================
## t = -75.204, df = 338, p-value = < 2.2e-16
## alternative hypothesis: true bias is not equal to 0
## 
## =======================
## Number of comparisons:  339 
## Maximum value for average measures:  3.204808 
## Minimum value for average measures:  2.05628 
## Maximum value for difference in measures:  0.2376452 
## Minimum value for difference in measures:  -0.9116021 
## 
## Bias:  -0.4623168 
## Standard deviation of bias:  0.113188 
## 
## Standard error of bias:  0.006147531 
## Standard error for limits of agreement:  0.01051643 
## 
## Bias:  -0.4623168 
## Bias- upper 95% CI:  -0.4502246 
## Bias- lower 95% CI:  -0.4744091 
## 
## Upper limit of agreement:  -0.2404683 
## Upper LOA- upper 95% CI:  -0.2197824 
## Upper LOA- lower 95% CI:  -0.2611542 
## 
## Lower limit of agreement:  -0.6841654 
## Lower LOA- upper 95% CI:  -0.6634795 
## Lower LOA- lower 95% CI:  -0.7048513 
## 
## =======================
## Derived measures:  
## Mean of differences/means:  -17.19129 
## Point estimate of bias as proportion of lowest average:  -22.48317 
## Point estimate of bias as proportion of highest average -14.42572 
## Spread of data between lower and upper LoAs:  0.4436971 
## Bias as proportion of LoA spread:  -104.1965 
## 
## =======================
## Bias: 
##  -0.4623168  ( -0.4744091  to  -0.4502246 ) 
## ULoA: 
##  -0.2404683  ( -0.2611542  to  -0.2197824 ) 
## LLoA: 
##  -0.6841654  ( -0.7048513  to  -0.6634795 )
df_DbL <- df_wide %>%
  filter(characteristic == "DbL")
(bar_DbL <- blandr.statistics(df_DbL$dig, df_DbL$man))
## Bland-Altman Statistics
## =======================
## t = -2.8434, df = 338, p-value = 0.004736
## alternative hypothesis: true bias is not equal to 0
## 
## =======================
## Number of comparisons:  339 
## Maximum value for average measures:  2.786619 
## Minimum value for average measures:  1.117524 
## Maximum value for difference in measures:  0.2886718 
## Minimum value for difference in measures:  -0.9342704 
## 
## Bias:  -0.01723212 
## Standard deviation of bias:  0.1115855 
## 
## Standard error of bias:  0.006060494 
## Standard error for limits of agreement:  0.01036754 
## 
## Bias:  -0.01723212 
## Bias- upper 95% CI:  -0.005311085 
## Bias- lower 95% CI:  -0.02915316 
## 
## Upper limit of agreement:  0.2014755 
## Upper LOA- upper 95% CI:  0.2218686 
## Upper LOA- lower 95% CI:  0.1810825 
## 
## Lower limit of agreement:  -0.2359398 
## Lower LOA- upper 95% CI:  -0.2155467 
## Lower LOA- lower 95% CI:  -0.2563328 
## 
## =======================
## Derived measures:  
## Mean of differences/means:  -1.003452 
## Point estimate of bias as proportion of lowest average:  -1.541991 
## Point estimate of bias as proportion of highest average -0.6183882 
## Spread of data between lower and upper LoAs:  0.4374153 
## Bias as proportion of LoA spread:  -3.939533 
## 
## =======================
## Bias: 
##  -0.01723212  ( -0.02915316  to  -0.005311085 ) 
## ULoA: 
##  0.2014755  ( 0.1810825  to  0.2218686 ) 
## LLoA: 
##  -0.2359398  ( -0.2563328  to  -0.2155467 )
df_HL <- df_wide %>%
  filter(characteristic == "HL")
(bar_HL <- blandr.statistics(df_HL$dig, df_HL$man))
## Bland-Altman Statistics
## =======================
## t = -16.151, df = 338, p-value = < 2.2e-16
## alternative hypothesis: true bias is not equal to 0
## 
## =======================
## Number of comparisons:  339 
## Maximum value for average measures:  2.565974 
## Minimum value for average measures:  1.480514 
## Maximum value for difference in measures:  0.528404 
## Minimum value for difference in measures:  -0.7045543 
## 
## Bias:  -0.1433821 
## Standard deviation of bias:  0.163451 
## 
## Standard error of bias:  0.008877438 
## Standard error for limits of agreement:  0.01518642 
## 
## Bias:  -0.1433821 
## Bias- upper 95% CI:  -0.1259201 
## Bias- lower 95% CI:  -0.1608441 
## 
## Upper limit of agreement:  0.1769818 
## Upper LOA- upper 95% CI:  0.2068536 
## Upper LOA- lower 95% CI:  0.14711 
## 
## Lower limit of agreement:  -0.463746 
## Lower LOA- upper 95% CI:  -0.4338742 
## Lower LOA- lower 95% CI:  -0.4936178 
## 
## =======================
## Derived measures:  
## Mean of differences/means:  -7.262377 
## Point estimate of bias as proportion of lowest average:  -9.684617 
## Point estimate of bias as proportion of highest average -5.587823 
## Spread of data between lower and upper LoAs:  0.6407278 
## Bias as proportion of LoA spread:  -22.378 
## 
## =======================
## Bias: 
##  -0.1433821  ( -0.1608441  to  -0.1259201 ) 
## ULoA: 
##  0.1769818  ( 0.14711  to  0.2068536 ) 
## LLoA: 
##  -0.463746  ( -0.4936178  to  -0.4338742 )
df_HD1 <- df_wide %>%
  filter(characteristic == "HD1")
(bar_HD1 <- blandr.statistics(df_HD1$dig, df_HD1$man))
## Bland-Altman Statistics
## =======================
## t = 14.044, df = 338, p-value = < 2.2e-16
## alternative hypothesis: true bias is not equal to 0
## 
## =======================
## Number of comparisons:  339 
## Maximum value for average measures:  2.601825 
## Minimum value for average measures:  1.451618 
## Maximum value for difference in measures:  0.4229753 
## Minimum value for difference in measures:  -0.2031489 
## 
## Bias:  0.06017856 
## Standard deviation of bias:  0.07889485 
## 
## Standard error of bias:  0.00428498 
## Standard error for limits of agreement:  0.007330212 
## 
## Bias:  0.06017856 
## Bias- upper 95% CI:  0.06860715 
## Bias- lower 95% CI:  0.05174997 
## 
## Upper limit of agreement:  0.2148125 
## Upper LOA- upper 95% CI:  0.229231 
## Upper LOA- lower 95% CI:  0.2003939 
## 
## Lower limit of agreement:  -0.09445534 
## Lower LOA- upper 95% CI:  -0.08003676 
## Lower LOA- lower 95% CI:  -0.1088739 
## 
## =======================
## Derived measures:  
## Mean of differences/means:  3.007816 
## Point estimate of bias as proportion of lowest average:  4.14562 
## Point estimate of bias as proportion of highest average 2.312937 
## Spread of data between lower and upper LoAs:  0.3092678 
## Bias as proportion of LoA spread:  19.4584 
## 
## =======================
## Bias: 
##  0.06017856  ( 0.05174997  to  0.06860715 ) 
## ULoA: 
##  0.2148125  ( 0.2003939  to  0.229231 ) 
## LLoA: 
##  -0.09445534  ( -0.1088739  to  -0.08003676 )
df_HD2 <- df_wide2 %>%
  filter(characteristic == "HD")
(bar_HD2 <- blandr.statistics(df_HD2$dig, df_HD2$man))
## Bland-Altman Statistics
## =======================
## t = -1.3693, df = 338, p-value = 0.1718
## alternative hypothesis: true bias is not equal to 0
## 
## =======================
## Number of comparisons:  339 
## Maximum value for average measures:  2.566481 
## Minimum value for average measures:  1.391822 
## Maximum value for difference in measures:  0.4478337 
## Minimum value for difference in measures:  -0.2905665 
## 
## Bias:  -0.007850176 
## Standard deviation of bias:  0.1055546 
## 
## Standard error of bias:  0.005732938 
## Standard error for limits of agreement:  0.009807199 
## 
## Bias:  -0.007850176 
## Bias- upper 95% CI:  0.003426556 
## Bias- lower 95% CI:  -0.01912691 
## 
## Upper limit of agreement:  0.1990368 
## Upper LOA- upper 95% CI:  0.2183276 
## Upper LOA- lower 95% CI:  0.179746 
## 
## Lower limit of agreement:  -0.2147372 
## Lower LOA- upper 95% CI:  -0.1954463 
## Lower LOA- lower 95% CI:  -0.234028 
## 
## =======================
## Derived measures:  
## Mean of differences/means:  -0.4187308 
## Point estimate of bias as proportion of lowest average:  -0.5640215 
## Point estimate of bias as proportion of highest average -0.3058731 
## Spread of data between lower and upper LoAs:  0.413774 
## Bias as proportion of LoA spread:  -1.897213 
## 
## =======================
## Bias: 
##  -0.007850176  ( -0.01912691  to  0.003426556 ) 
## ULoA: 
##  0.1990368  ( 0.179746  to  0.2183276 ) 
## LLoA: 
##  -0.2147372  ( -0.234028  to  -0.1954463 )
df_SnL <- df_wide %>%
  filter(characteristic == "SnL")
(bar_SnL <- blandr.statistics(df_SnL$dig, df_SnL$man))
## Bland-Altman Statistics
## =======================
## t = -22.082, df = 338, p-value = < 2.2e-16
## alternative hypothesis: true bias is not equal to 0
## 
## =======================
## Number of comparisons:  339 
## Maximum value for average measures:  1.547597 
## Minimum value for average measures:  0.3834636 
## Maximum value for difference in measures:  0.2909172 
## Minimum value for difference in measures:  -0.7671152 
## 
## Bias:  -0.1897858 
## Standard deviation of bias:  0.158242 
## 
## Standard error of bias:  0.008594523 
## Standard error for limits of agreement:  0.01470244 
## 
## Bias:  -0.1897858 
## Bias- upper 95% CI:  -0.1728803 
## Bias- lower 95% CI:  -0.2066913 
## 
## Upper limit of agreement:  0.1203684 
## Upper LOA- upper 95% CI:  0.1492882 
## Upper LOA- lower 95% CI:  0.0914486 
## 
## Lower limit of agreement:  -0.49994 
## Lower LOA- upper 95% CI:  -0.4710202 
## Lower LOA- lower 95% CI:  -0.5288599 
## 
## =======================
## Derived measures:  
## Mean of differences/means:  -21.11482 
## Point estimate of bias as proportion of lowest average:  -49.49253 
## Point estimate of bias as proportion of highest average -12.26326 
## Spread of data between lower and upper LoAs:  0.6203085 
## Bias as proportion of LoA spread:  -30.59539 
## 
## =======================
## Bias: 
##  -0.1897858  ( -0.2066913  to  -0.1728803 ) 
## ULoA: 
##  0.1203684  ( 0.0914486  to  0.1492882 ) 
## LLoA: 
##  -0.49994  ( -0.5288599  to  -0.4710202 )
df_OL <- df_wide %>%
  filter(characteristic == "OL")
(bar_OL <- blandr.statistics(df_OL$dig, df_OL$man))
## Bland-Altman Statistics
## =======================
## t = -1.7645, df = 338, p-value = 0.07855
## alternative hypothesis: true bias is not equal to 0
## 
## =======================
## Number of comparisons:  339 
## Maximum value for average measures:  1.471467 
## Minimum value for average measures:  0.6198813 
## Maximum value for difference in measures:  0.4116851 
## Minimum value for difference in measures:  -0.3320279 
## 
## Bias:  -0.009155384 
## Standard deviation of bias:  0.09553187 
## 
## Standard error of bias:  0.005188579 
## Standard error for limits of agreement:  0.008875977 
## 
## Bias:  -0.009155384 
## Bias- upper 95% CI:  0.001050588 
## Bias- lower 95% CI:  -0.01936136 
## 
## Upper limit of agreement:  0.1780871 
## Upper LOA- upper 95% CI:  0.1955462 
## Upper LOA- lower 95% CI:  0.160628 
## 
## Lower limit of agreement:  -0.1963978 
## Lower LOA- upper 95% CI:  -0.1789387 
## Lower LOA- lower 95% CI:  -0.213857 
## 
## =======================
## Derived measures:  
## Mean of differences/means:  -0.6778975 
## Point estimate of bias as proportion of lowest average:  -1.476958 
## Point estimate of bias as proportion of highest average -0.6221944 
## Spread of data between lower and upper LoAs:  0.3744849 
## Bias as proportion of LoA spread:  -2.444794 
## 
## =======================
## Bias: 
##  -0.009155384  ( -0.01936136  to  0.001050588 ) 
## ULoA: 
##  0.1780871  ( 0.160628  to  0.1955462 ) 
## LLoA: 
##  -0.1963978  ( -0.213857  to  -0.1789387 )
df_predl <- df_wide %>%
  filter(characteristic == "PreDL")

# Separate Digital and Hand values
digital_values <- df_predl$dig
hand_values <- df_predl$man

# Apply the blandr.statistics function
stats <- blandr.statistics(method1 = digital_values, method2 = hand_values)

# Extract the lower and upper limits of agreement
lower_limit <- stats$upperLOA
upper_limit <- stats$lowerLOA

# Calculate differences and filter for outliers
df_predl <- df_predl %>%
  mutate(diff = dig - man) %>%
  filter(diff < lower_limit | diff > upper_limit)

# View the outliers for PreDL
print(df_predl)
## # A tibble: 339 × 6
##    ID       SPP   characteristic   dig   man   diff
##    <fct>    <fct> <fct>          <dbl> <dbl>  <dbl>
##  1 AD19-001 p.lat PreDL           2.72  3.10 -0.384
##  2 AD19-002 p.lat PreDL           2.53  3.04 -0.506
##  3 AD19-003 p.lat PreDL           2.82  3.25 -0.433
##  4 AD19-004 p.lat PreDL           2.33  2.83 -0.500
##  5 AD19-005 p.lat PreDL           2.64  3.05 -0.406
##  6 AD19-006 p.lat PreDL           2.62  3.02 -0.405
##  7 AD19-007 p.lat PreDL           2.44  2.91 -0.466
##  8 AD19-008 p.lat PreDL           2.45  2.86 -0.412
##  9 AD19-009 p.lat PreDL           2.33  2.91 -0.574
## 10 AD19-010 p.lat PreDL           2.63  3.04 -0.414
## # ℹ 329 more rows
cat("Mean Difference:", mean(df_predl$diff), "\n")
## Mean Difference: -0.4623168
cat("Lower Limit:", lower_limit, "\n")
## Lower Limit: -0.2404683
cat("Upper Limit:", upper_limit, "\n")
## Upper Limit: -0.6841654
df_PreDL <- df_predl %>%
  mutate(
    average = (dig + man) / 2,
    difference = dig - man
  )


ba_PreDL <- ggplot(df_PreDL, aes(x = average, y = difference, color=SPP)) +
  geom_point(alpha = 0.6) +
  geom_hline(yintercept = 0, linetype = "solid", color = "black") +
  geom_hline(yintercept = mean(df_ba$difference, na.rm = TRUE), color = "red", linetype = "dashed") +  # Mean line
  geom_hline(yintercept = mean(df_ba$difference, na.rm = TRUE) + c(-1.96, 1.96) * sd(df_ba$difference, na.rm = TRUE),
             color = "blue", linetype = "dashed") +  # Limits of agreement
  labs(title = "Bland-Altman Plots for All Characteristics",
       x = "Average of Digital and Hand Measurements",
       y = "Difference (Digital - Hand)") +
  theme_minimal()

ba_PreDL

library(blandr)

blandr.statistics(df_PreDL$dig, df_PreDL$man)
## Bland-Altman Statistics
## =======================
## t = -75.204, df = 338, p-value = < 2.2e-16
## alternative hypothesis: true bias is not equal to 0
## 
## =======================
## Number of comparisons:  339 
## Maximum value for average measures:  3.204808 
## Minimum value for average measures:  2.05628 
## Maximum value for difference in measures:  0.2376452 
## Minimum value for difference in measures:  -0.9116021 
## 
## Bias:  -0.4623168 
## Standard deviation of bias:  0.113188 
## 
## Standard error of bias:  0.006147531 
## Standard error for limits of agreement:  0.01051643 
## 
## Bias:  -0.4623168 
## Bias- upper 95% CI:  -0.4502246 
## Bias- lower 95% CI:  -0.4744091 
## 
## Upper limit of agreement:  -0.2404683 
## Upper LOA- upper 95% CI:  -0.2197824 
## Upper LOA- lower 95% CI:  -0.2611542 
## 
## Lower limit of agreement:  -0.6841654 
## Lower LOA- upper 95% CI:  -0.6634795 
## Lower LOA- lower 95% CI:  -0.7048513 
## 
## =======================
## Derived measures:  
## Mean of differences/means:  -17.19129 
## Point estimate of bias as proportion of lowest average:  -22.48317 
## Point estimate of bias as proportion of highest average -14.42572 
## Spread of data between lower and upper LoAs:  0.4436971 
## Bias as proportion of LoA spread:  -104.1965 
## 
## =======================
## Bias: 
##  -0.4623168  ( -0.4744091  to  -0.4502246 ) 
## ULoA: 
##  -0.2404683  ( -0.2611542  to  -0.2197824 ) 
## LLoA: 
##  -0.6841654  ( -0.7048513  to  -0.6634795 )

PCA

In this analysis, I will compare the principle components after centering and scaling the data. A PCA analysis will help us determine which methodology influence the variation in our data the most without worrying about differences in scales/measurements.

Loadings

library(stringr)

z.scores <- raw2

z.scores[, c(3:24)] <- scale(z.scores[, 3:24], center = TRUE, scale = TRUE)

PCA <- prcomp(z.scores[, 3:24])

summary(PCA)
## Importance of components:
##                          PC1     PC2     PC3    PC4     PC5    PC6     PC7
## Standard deviation     4.182 1.21217 0.84996 0.7357 0.56329 0.5180 0.45818
## Proportion of Variance 0.795 0.06679 0.03284 0.0246 0.01442 0.0122 0.00954
## Cumulative Proportion  0.795 0.86180 0.89463 0.9192 0.93366 0.9458 0.95540
##                            PC8     PC9    PC10    PC11    PC12    PC13    PC14
## Standard deviation     0.42037 0.41587 0.33673 0.32661 0.30076 0.28121 0.25212
## Proportion of Variance 0.00803 0.00786 0.00515 0.00485 0.00411 0.00359 0.00289
## Cumulative Proportion  0.96343 0.97129 0.97644 0.98129 0.98540 0.98900 0.99189
##                           PC15    PC16    PC17    PC18    PC19    PC20    PC21
## Standard deviation     0.22165 0.19780 0.16764 0.16404 0.12458 0.10221 0.07463
## Proportion of Variance 0.00223 0.00178 0.00128 0.00122 0.00071 0.00047 0.00025
## Cumulative Proportion  0.99412 0.99590 0.99718 0.99840 0.99911 0.99958 0.99983
##                           PC22
## Standard deviation     0.06048
## Proportion of Variance 0.00017
## Cumulative Proportion  1.00000
(loadings <- PCA$rotation[, 1:5])
##                  PC1          PC2         PC3          PC4          PC5
## SL.dig    -0.2347778 -0.096996721  0.01693687 -0.006176149  0.062160581
## BD1.dig   -0.2284302  0.040438320  0.11828459 -0.082195318 -0.020081064
## BD2.dig   -0.2265082 -0.204230055  0.04686845 -0.114485959  0.014698263
## CPD.dig   -0.2331491  0.027815265  0.02843859  0.060726336  0.028640808
## CPL.dig   -0.2233653 -0.122526759 -0.01765293  0.051407052  0.087380963
## PreDL.dig -0.1795926 -0.455754637  0.38328152 -0.035114502  0.036990294
## DbL.dig   -0.1792735  0.492804564  0.15433728  0.093722366  0.132645722
## HL.dig    -0.1881629  0.037841635 -0.69302734 -0.129545756  0.040276382
## HD1.dig   -0.2314303  0.092691464 -0.15121241 -0.074290038 -0.049191518
## HD2.dig   -0.2202611  0.126740445 -0.37372814 -0.114658545 -0.058170881
## SnL.dig   -0.1887660 -0.209724253 -0.23206376  0.438310297  0.299520788
## OL.dig    -0.2176019 -0.051613729 -0.08223996 -0.187395317 -0.246012805
## SL.man    -0.2346467 -0.066007194  0.05419313 -0.013410667  0.043012548
## BD1.man   -0.2233880  0.137447421  0.15671013  0.048596434 -0.036769031
## CPD.man   -0.2275129  0.051199268  0.08915917  0.048228291  0.001670173
## CPL.man   -0.2229489 -0.084361388  0.02254979  0.033420529  0.088728475
## PreDL.man -0.2160456 -0.280047719  0.02145309 -0.055310926  0.063545028
## DbL.man   -0.1679576  0.536687777  0.19654399  0.059968129  0.115261547
## HL.man    -0.2113179  0.033215809  0.05468238 -0.198535844  0.394539908
## HD1.man   -0.2253717  0.081537288  0.14871885 -0.052250635  0.089215660
## SnL.man   -0.1857245 -0.004219492 -0.02498576  0.725115394 -0.468553383
## OL.man    -0.2025594  0.030649519  0.08639455 -0.347119680 -0.631563787
sorted.loadings.1 <- loadings[order(loadings[, 1]), 1]
myTitle <- "Loadings Plot for PC1" 
myXlab  <- "Variable Loadings"
dotchart(sorted.loadings.1, main=myTitle, xlab=myXlab, cex=1.5, col="red")

sorted.loadings.2 <- loadings[order(loadings[, 2]), 2]
myTitle <- "Loadings Plot for PC2" 
myXlab  <- "Variable Loadings"
dotchart(sorted.loadings.2, main=myTitle, xlab=myXlab, cex=1.5, col="red")

sorted.loadings.3 <- loadings[order(loadings[, 3]), 3]
myTitle <- "Loadings Plot for PC3" 
myXlab  <- "Variable Loadings"
dotchart(sorted.loadings.3, main=myTitle, xlab=myXlab, cex=1.5, col="red")

VM_PCA <- varimax(PCA$rotation[, 1:3]) 

VM_PCA
## $loadings
## 
## Loadings:
##           PC1    PC2    PC3   
## SL.dig    -0.116 -0.220       
## BD1.dig   -0.235 -0.112       
## BD2.dig          -0.302       
## CPD.dig   -0.190 -0.120       
## CPL.dig          -0.231       
## PreDL.dig        -0.493  0.376
## DbL.dig   -0.469  0.276       
## HL.dig     0.168        -0.698
## HD1.dig   -0.142        -0.248
## HD2.dig                 -0.449
## SnL.dig          -0.267 -0.231
## OL.dig           -0.167 -0.147
## SL.man    -0.151 -0.197       
## BD1.man   -0.303              
## CPD.man   -0.227 -0.101       
## CPL.man   -0.118 -0.203       
## PreDL.man        -0.354       
## DbL.man   -0.505  0.315       
## HL.man    -0.190 -0.104       
## HD1.man   -0.270              
## SnL.man   -0.115 -0.114       
## OL.man    -0.197 -0.102       
## 
##                  PC1   PC2   PC3
## SS loadings    1.000 1.000 1.000
## Proportion Var 0.045 0.045 0.045
## Cumulative Var 0.045 0.091 0.136
## 
## $rotmat
##            [,1]        [,2]       [,3]
## [1,]  0.6917396  0.60222957  0.3985170
## [2,] -0.5560565  0.79629540 -0.2381487
## [3,] -0.4607575 -0.05686105  0.8857027
library(factoextra)
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
(eig.val <- get_eigenvalue(PCA))
##          eigenvalue variance.percent cumulative.variance.percent
## Dim.1  17.490167246      79.50076021                    79.50076
## Dim.2   1.469344948       6.67884067                    86.17960
## Dim.3   0.722436480       3.28380218                    89.46340
## Dim.4   0.541231220       2.46014191                    91.92354
## Dim.5   0.317290132       1.44222787                    93.36577
## Dim.6   0.268321832       1.21964469                    94.58542
## Dim.7   0.209931770       0.95423532                    95.53965
## Dim.8   0.176710702       0.80323046                    96.34288
## Dim.9   0.172949491       0.78613405                    97.12902
## Dim.10  0.113385668       0.51538940                    97.64441
## Dim.11  0.106671751       0.48487160                    98.12928
## Dim.12  0.090458276       0.41117398                    98.54045
## Dim.13  0.079078144       0.35944611                    98.89990
## Dim.14  0.063562571       0.28892078                    99.18882
## Dim.15  0.049128830       0.22331286                    99.41213
## Dim.16  0.039123834       0.17783561                    99.58997
## Dim.17  0.028104434       0.12774743                    99.71772
## Dim.18  0.026908568       0.12231167                    99.84003
## Dim.19  0.015519246       0.07054203                    99.91057
## Dim.20  0.010446972       0.04748624                    99.95806
## Dim.21  0.005570293       0.02531951                    99.98337
## Dim.22  0.003657591       0.01662541                   100.00000
ind <- get_pca_ind(PCA)
head(ind$coord[,1:4])
##        Dim.1     Dim.2      Dim.3       Dim.4
## 1 -5.4067751 0.4298672  0.4151608  0.68096887
## 2 -5.9873007 1.2851793 -0.4499961  0.35356493
## 3 -8.8447937 0.4505076  0.3553199  0.79382477
## 4  0.3842836 1.0995520 -0.4421054  0.15666553
## 5 -3.6094489 0.6935377  0.4998015 -0.39342915
## 6 -3.9868310 0.8667640  0.2054317 -0.03320378
raw.p <- cbind(z.scores, ind$coord[,1:4])

Post PCA tests

In order to compare dimensions 1-4 between manual and digital measurements, I need to actually run a PCA on digital only and manual only data sets, then perform a paired t-test.

Let’s create the data sets

dig_p_long <- z.scores %>%
  select(ID, SPP, ends_with(".dig"))


man_p_long <- z.scores %>%
  select(ID, SPP, ends_with(".man"))

Now let’s run the individual PCAs

Digital PCA

PCA_dig <- prcomp(dig_p_long[, 3:14])

summary(PCA_dig)
## Importance of components:
##                           PC1     PC2     PC3     PC4     PC5     PC6     PC7
## Standard deviation     3.1141 0.97826 0.74710 0.56729 0.41175 0.34804 0.25763
## Proportion of Variance 0.8081 0.07975 0.04651 0.02682 0.01413 0.01009 0.00553
## Cumulative Proportion  0.8081 0.88787 0.93438 0.96120 0.97533 0.98542 0.99096
##                            PC8     PC9    PC10    PC11    PC12
## Standard deviation     0.21796 0.19369 0.11394 0.07637 0.06855
## Proportion of Variance 0.00396 0.00313 0.00108 0.00049 0.00039
## Cumulative Proportion  0.99491 0.99804 0.99912 0.99961 1.00000
(loadings_dig <- PCA_dig$rotation[, 1:5])
##                  PC1         PC2          PC3         PC4         PC5
## SL.dig    -0.3169909 -0.10611458  0.060604567 -0.02354075  0.18176533
## BD1.dig   -0.3049862  0.01844403  0.250670038 -0.08937686 -0.10696578
## BD2.dig   -0.3063687 -0.22926027  0.015047023 -0.21325192  0.09067490
## CPD.dig   -0.3113160  0.02702292  0.134490705  0.06974432  0.10056312
## CPL.dig   -0.3033025 -0.12560059  0.020502291 -0.01630232  0.47718294
## PreDL.dig -0.2422550 -0.64531966  0.171752342 -0.09719333  0.02497410
## DbL.dig   -0.2321796  0.50569082  0.572575906  0.32709467  0.04365808
## HL.dig    -0.2625901  0.30850000 -0.607101208 -0.21853627  0.16435216
## HD1.dig   -0.3120389  0.17140191  0.004180317 -0.07011570 -0.05451678
## HD2.dig   -0.2995263  0.29442543 -0.204803116 -0.16644442  0.01982239
## SnL.dig   -0.2592646 -0.18806057 -0.382995034  0.84022275 -0.14244593
## OL.dig    -0.2964414 -0.02385058 -0.014007596 -0.19509333 -0.81011359
dig.sorted.loadings.1 <- loadings_dig[order(loadings_dig[, 1]), 1]
dig.myTitle <- "Loadings Plot for PC1" 
dig.myXlab  <- "Variable Loadings"
dotchart(dig.sorted.loadings.1, main=dig.myTitle, xlab=dig.myXlab, cex=1.5, col="red")

dig.sorted.loadings.2 <- loadings_dig[order(loadings_dig[, 2]), 2]
dig.myTitle <- "Loadings Plot for PC2" 
dig.myXlab  <- "Variable Loadings"
dotchart(dig.sorted.loadings.2, main=dig.myTitle, xlab=dig.myXlab, cex=1.5, col="red")

dig.sorted.loadings.3 <- loadings_dig[order(loadings_dig[, 3]), 3]
dig.myTitle <- "Loadings Plot for PC3" 
dig.myXlab  <- "Variable Loadings"
dotchart(dig.sorted.loadings.3, main=dig.myTitle, xlab=dig.myXlab, cex=1.5, col="red")

dig.VM_PCA <- varimax(PCA$rotation[, 1:3]) 

dig.VM_PCA
## $loadings
## 
## Loadings:
##           PC1    PC2    PC3   
## SL.dig    -0.116 -0.220       
## BD1.dig   -0.235 -0.112       
## BD2.dig          -0.302       
## CPD.dig   -0.190 -0.120       
## CPL.dig          -0.231       
## PreDL.dig        -0.493  0.376
## DbL.dig   -0.469  0.276       
## HL.dig     0.168        -0.698
## HD1.dig   -0.142        -0.248
## HD2.dig                 -0.449
## SnL.dig          -0.267 -0.231
## OL.dig           -0.167 -0.147
## SL.man    -0.151 -0.197       
## BD1.man   -0.303              
## CPD.man   -0.227 -0.101       
## CPL.man   -0.118 -0.203       
## PreDL.man        -0.354       
## DbL.man   -0.505  0.315       
## HL.man    -0.190 -0.104       
## HD1.man   -0.270              
## SnL.man   -0.115 -0.114       
## OL.man    -0.197 -0.102       
## 
##                  PC1   PC2   PC3
## SS loadings    1.000 1.000 1.000
## Proportion Var 0.045 0.045 0.045
## Cumulative Var 0.045 0.091 0.136
## 
## $rotmat
##            [,1]        [,2]       [,3]
## [1,]  0.6917396  0.60222957  0.3985170
## [2,] -0.5560565  0.79629540 -0.2381487
## [3,] -0.4607575 -0.05686105  0.8857027
(dig.eig.val <- get_eigenvalue(PCA_dig))
##         eigenvalue variance.percent cumulative.variance.percent
## Dim.1  9.697460424      80.81217020                    80.81217
## Dim.2  0.956988897       7.97490747                    88.78708
## Dim.3  0.558155419       4.65129516                    93.43837
## Dim.4  0.321819658       2.68183049                    96.12020
## Dim.5  0.169536257       1.41280214                    97.53301
## Dim.6  0.121130526       1.00942105                    98.54243
## Dim.7  0.066371341       0.55309451                    99.09552
## Dim.8  0.047504771       0.39587309                    99.49139
## Dim.9  0.037517676       0.31264730                    99.80404
## Dim.10 0.012983044       0.10819203                    99.91223
## Dim.11 0.005832920       0.04860766                    99.96084
## Dim.12 0.004699067       0.03915889                   100.00000
dig.ind <- get_pca_ind(PCA_dig)
head(dig.ind$coord[,1:4])
##       Dim.1      Dim.2       Dim.3     Dim.4
## 1 -3.877405 -0.1360907  0.21810368 0.6037004
## 2 -4.350838  1.0993666  0.01943977 0.6087639
## 3 -6.089192  0.1481213  0.42598106 0.5018142
## 4  0.215744  0.9684140 -0.13146684 0.9619029
## 5 -2.633685  0.2573236  0.58017122 0.2108573
## 6 -3.078674  0.3356335  0.19729525 0.6931265
dig.p <- cbind(dig_p_long, dig.ind$coord[,1:4])

Manual PCA

PCA_man <- prcomp(man_p_long[, 3:12])

summary(PCA_man)
## Importance of components:
##                           PC1     PC2     PC3    PC4     PC5     PC6     PC7
## Standard deviation     2.8377 0.80736 0.67359 0.5225 0.42289 0.34999 0.30348
## Proportion of Variance 0.8052 0.06518 0.04537 0.0273 0.01788 0.01225 0.00921
## Cumulative Proportion  0.8052 0.87042 0.91580 0.9431 0.96098 0.97323 0.98244
##                            PC8     PC9    PC10
## Standard deviation     0.28419 0.27405 0.14043
## Proportion of Variance 0.00808 0.00751 0.00197
## Cumulative Proportion  0.99052 0.99803 1.00000
(loadings_man <- PCA_man$rotation[, 1:5])
##                  PC1          PC2          PC3         PC4         PC5
## SL.man    -0.3448760 -0.159264090  0.028818596  0.08348653  0.17222913
## BD1.man   -0.3336431  0.171772379 -0.041810129 -0.02260661  0.23746504
## CPD.man   -0.3381457  0.020375133 -0.022859381  0.02968760  0.21184581
## CPL.man   -0.3287681 -0.213919604 -0.004339041  0.15099603  0.37304666
## PreDL.man -0.3128472 -0.481495839  0.092930271  0.10276539  0.13586031
## DbL.man   -0.2603841  0.806413535 -0.039923819  0.07691051  0.12592852
## HL.man    -0.3152071  0.005916009  0.329640553  0.42089544 -0.72541972
## HD1.man   -0.3367866  0.089004745  0.107331634  0.10025382 -0.06396107
## SnL.man   -0.2773374 -0.093007562 -0.874263032 -0.13518487 -0.35690086
## OL.man    -0.3032190 -0.014288942  0.319579808 -0.86422398 -0.19464322
man.sorted.loadings.1 <- loadings_man[order(loadings_man[, 1]), 1]
man.myTitle <- "Loadings Plot for PC1" 
man.myXlab  <- "Variable Loadings"
dotchart(man.sorted.loadings.1, main=man.myTitle, xlab=man.myXlab, cex=1.5, col="red")

man.sorted.loadings.2 <- loadings_man[order(loadings_man[, 2]), 2]
man.myTitle <- "Loadings Plot for PC2" 
man.myXlab  <- "Variable Loadings"
dotchart(man.sorted.loadings.2, main=man.myTitle, xlab=man.myXlab, cex=1.5, col="red")

man.sorted.loadings.3 <- loadings_man[order(loadings_man[, 3]), 3]
man.myTitle <- "Loadings Plot for PC3" 
man.myXlab  <- "Variable Loadings"
dotchart(man.sorted.loadings.3, main=man.myTitle, xlab=man.myXlab, cex=1.5, col="red")

man.VM_PCA <- varimax(PCA$rotation[, 1:3]) 

man.VM_PCA
## $loadings
## 
## Loadings:
##           PC1    PC2    PC3   
## SL.dig    -0.116 -0.220       
## BD1.dig   -0.235 -0.112       
## BD2.dig          -0.302       
## CPD.dig   -0.190 -0.120       
## CPL.dig          -0.231       
## PreDL.dig        -0.493  0.376
## DbL.dig   -0.469  0.276       
## HL.dig     0.168        -0.698
## HD1.dig   -0.142        -0.248
## HD2.dig                 -0.449
## SnL.dig          -0.267 -0.231
## OL.dig           -0.167 -0.147
## SL.man    -0.151 -0.197       
## BD1.man   -0.303              
## CPD.man   -0.227 -0.101       
## CPL.man   -0.118 -0.203       
## PreDL.man        -0.354       
## DbL.man   -0.505  0.315       
## HL.man    -0.190 -0.104       
## HD1.man   -0.270              
## SnL.man   -0.115 -0.114       
## OL.man    -0.197 -0.102       
## 
##                  PC1   PC2   PC3
## SS loadings    1.000 1.000 1.000
## Proportion Var 0.045 0.045 0.045
## Cumulative Var 0.045 0.091 0.136
## 
## $rotmat
##            [,1]        [,2]       [,3]
## [1,]  0.6917396  0.60222957  0.3985170
## [2,] -0.5560565  0.79629540 -0.2381487
## [3,] -0.4607575 -0.05686105  0.8857027
library(factoextra)

(man.eig.val <- get_eigenvalue(PCA_man))
##        eigenvalue variance.percent cumulative.variance.percent
## Dim.1  8.05239960       80.5239960                    80.52400
## Dim.2  0.65182332        6.5182332                    87.04223
## Dim.3  0.45372921        4.5372921                    91.57952
## Dim.4  0.27302458        2.7302458                    94.30977
## Dim.5  0.17883881        1.7883881                    96.09816
## Dim.6  0.12249607        1.2249607                    97.32312
## Dim.7  0.09210306        0.9210306                    98.24415
## Dim.8  0.08076290        0.8076290                    99.05178
## Dim.9  0.07510089        0.7510089                    99.80278
## Dim.10 0.01972155        0.1972155                   100.00000
man.ind <- get_pca_ind(PCA_man)
head(man.ind$coord[,1:4])
##        Dim.1     Dim.2        Dim.3      Dim.4
## 1 -3.7784207 0.6451182 -0.396593632 0.06849222
## 2 -4.1217187 0.6603292  0.008902533 0.87860206
## 3 -6.4466569 0.1376030 -0.378551677 0.95660998
## 4  0.3309617 0.6450504  0.446021000 1.18793061
## 5 -2.4644018 0.5773132  0.605969577 0.29865214
## 6 -2.5445684 0.9381672  0.511011644 0.61730956
man.p <- cbind(man_p_long, man.ind$coord[,1:4])

Now let’s do the t-tests

T-tests

(pc1 <- t.test(dig.p$Dim.1, man.p$Dim.1, paired= TRUE, alternative = "two.sided"))
## 
##  Paired t-test
## 
## data:  dig.p$Dim.1 and man.p$Dim.1
## t = 7.2114e-16, df = 338, p-value = 1
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -0.08315722  0.08315722
## sample estimates:
## mean difference 
##    3.048687e-17
(pc2 <- t.test(dig.p$Dim.2, man.p$Dim.2, paired= TRUE, alternative = "two.sided"))
## 
##  Paired t-test
## 
## data:  dig.p$Dim.2 and man.p$Dim.2
## t = -2.1474e-15, df = 338, p-value = 1
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -0.07233049  0.07233049
## sample estimates:
## mean difference 
##   -7.896446e-17
(pc3 <- t.test(dig.p$Dim.3, man.p$Dim.3, paired= TRUE, alternative = "two.sided"))
## 
##  Paired t-test
## 
## data:  dig.p$Dim.3 and man.p$Dim.3
## t = 4.112e-17, df = 338, p-value = 1
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -0.1053799  0.1053799
## sample estimates:
## mean difference 
##    2.202945e-18

Plots

PCA 1v2

library(AMR) 
library(ggplot2)
library(ggfortify)
library(ggbiplot)
## 
## Attaching package: 'ggbiplot'
## The following object is masked from 'package:ggfortify':
## 
##     ggbiplot
wes.yel <- "#E1AF00"
wes.blu <- "#78B7C5"

plot1<- autoplot(PCA, data = z.scores, colour="SPP", loadings=FALSE, loadings.label=FALSE, frame=TRUE, frame.type='norm')+scale_color_manual(values=c("#E1AF00","#78B7C5", "#856A9C"))+ scale_fill_manual(values=c("#E1AF00","#78B7C5", "#856A9C"))+ ggtitle("PCA Plot of Morphology traits") + theme_classic() 



#ggsave(plot1, filename = "pc1v2.png", bg="transparent", width = 180, height = 120, units = "mm", dpi = 300)

PCA 2v3

plot5<- autoplot(PCA, x=2, y=3, data = z.scores, colour='SPP', loadings=FALSE, loadings.label=FALSE, frame=TRUE, frame.type='norm')+scale_color_manual(values=c("#E1AF00","#78B7C5", "#856A9C"))+ scale_fill_manual(values=c("#E1AF00","#78B7C5", "#856A9C"))+ ggtitle("PCA Plot of Morphology traits") + theme_classic()
plot5

#ggsave(plot5, filename = "pc2v3.png", bg="transparent", width = 180, height = 120, units = "mm", dpi = 300)

By method

If I want to color by method, I think I need to combine my individual PCAs.

#data frame with just BD1 and HD1 for dig
dig_p_long2 <- z.scores %>%
  select(ID, SPP, ends_with(".dig"))
dig_p_long2 <- subset(dig_p_long2, select = -c(BD2.dig, HD2.dig))

#data frame with just BD2 and HD2 for dig
dig_p_long3 <- z.scores %>%
  select(ID, SPP, ends_with(".dig"))
dig_p_long3 <- subset(dig_p_long3, select = -c(BD1.dig, HD1.dig))

PCA_dig <- prcomp(dig_p_long2[, 3:12])

#PC scores for digital (BD1/HD1)
PCA.scores.dig <- as.data.frame(PCA_dig$x)
PCA.scores.dig$method <- "dig"
PCA.scores.dig$ID <- dig_p_long$ID
PCA.scores.dig$SPP <- dig_p_long$SPP

PCA_dig2 <- prcomp(dig_p_long3[, 3:12])

#PC scores for digital (BD1/HD1)
PCA.scores.dig2 <- as.data.frame(PCA_dig2$x)
PCA.scores.dig2$method <- "dig"
PCA.scores.dig2$ID <- dig_p_long2$ID
PCA.scores.dig2$SPP <- dig_p_long2$SPP

#PC scores for manual
PCA.scores.man <- as.data.frame(PCA_man$x)
PCA.scores.man$method <- "man"
PCA.scores.man$ID <- man_p_long$ID
PCA.scores.man$SPP <- man_p_long$SPP

PCA.comb <- rbind(PCA.scores.dig, PCA.scores.man)
PCA.comb2 <- rbind(PCA.scores.dig2, PCA.scores.man)

ggplot(data = PCA.comb, aes(x = PC1, y = PC2, color = method)) +
  geom_point(alpha = 0.8, size = 3) +
  stat_ellipse(aes(group = method), level = 0.95) +
  scale_color_manual(values=c("#E1AF00","#78B7C5"))+
  scale_fill_manual(values=c("#E1AF00","#78B7C5")) +
  labs(
    title = "PCA of Digital and Manual Measurements (BD/HD1)",
    x = "Principal Component 1",
    y = "Principal Component 2",
    color = "Method"
  ) +
  theme_minimal()

ggplot(data = PCA.comb, aes(x = PC2, y = PC3, color = method)) +
  geom_point(alpha = 0.8, size = 3) +
  stat_ellipse(aes(group = method), level = 0.95) +
  scale_color_manual(values=c("#E1AF00","#78B7C5"))+
  scale_fill_manual(values=c("#E1AF00","#78B7C5")) +
  labs(
    title = "PCA of Digital and Manual Measurements (BD/HD1)",
    x = "Principal Component 1",
    y = "Principal Component 2",
    color = "Method"
  ) +
  theme_minimal()

ggplot(data = PCA.comb2, aes(x = PC1, y = PC2, color = method)) +
  geom_point(alpha = 0.8, size = 3) +
  stat_ellipse(aes(group = method), level = 0.95) +
  scale_color_manual(values=c("#E1AF00","#78B7C5"))+
  scale_fill_manual(values=c("#E1AF00","#78B7C5")) +
  labs(
    title = "PCA of Digital and Manual Measurements (BD/HD2)",
    x = "Principal Component 1",
    y = "Principal Component 2",
    color = "Method"
  ) +
  theme_minimal()

ggplot(data = PCA.comb2, aes(x = PC2, y = PC3, color = method)) +
  geom_point(alpha = 0.8, size = 3) +
  stat_ellipse(aes(group = method), level = 0.95) +
  scale_color_manual(values=c("#E1AF00","#78B7C5"))+
  scale_fill_manual(values=c("#E1AF00","#78B7C5")) +
  labs(
    title = "PCA of Digital and Manual Measurements (BD/HD2)",
    x = "Principal Component 1",
    y = "Principal Component 2",
    color = "Method"
  ) +
  theme_minimal()